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Abstract - To solve extremely large problem sizes in electromagnetics, on the order of 
millions of unknowns, a novel matrix compression technique, Adaptive Cross 
Approximation (AC A), is applied to the Method of Moments (MoM). To implement 
ACA, several major enhancements to the MoM were required: spatial grouping of 
unknowns, ACA compression of the impedance matrix, Lower/Upper (LU) compressed 
factorization factors and ACA compression of the right hand sides. 

When unknowns for electrically large bodies are spatially grouped into local block 
regions, the blocks of the Method of Moments (MOM) Z matrix and its Lower/Upper 
(LU) factors are comprised of block sub matrices which, except for the diagonal blocks, 
can be well approximated by low rank matrices. These low rank approximates can then 
be computed using the Adaptive Cross Approximation (ACA), a technique which has 
very significant reduction in memory and operations count. And for monostatic 
scattering of the above system, where there are many right hand sides, the blocked right 
hand side V and current solution J can also be expressed in terms of low rank 
approximations. This report describes a novel approach in utilizing the Adaptive Cross 
Approximation technique to factor the Z matrix, Z = LU and to back-solve the blocked 
MOM system equation for a frequency domain electric field integral equation for 3D 
PEC surfaces. The integral equation uses standard Rao Wilton Glisson basis and test 
functions so that the resulting matrix is symmetric. 

Compressible matrices and their low rank approximations fundamentally mean that most 
of the blocked MOM system matrix equation elements, before compression, contain very 
little physical information. The ACA can be used to extract and compress the system 
equation elements, keeping only the necessary physics. The ACA can be used for all 
steps of the solution: filling the Z matrix: LU factoring; and LU solve. This results, 
depending on problem size, in orders of magnitude reduction of memory and run times 
and means that one can use inexpensive computer resources for problems that previously 
took super computers or could not be solved at all. 


NASA funded validation and problem size extensions, earlier portions funded by Mrs. Shaeffer and 
indirectly by the Office of Naval Research. 


It is also important to point out that this ACA LU factorization and solution is completely 
within the context of standard Method of Moments. There are no requirements for new 
basis or test functions, no requirements for far or near field distinction, no requirements 
for auxiliary “equivalent” sources on a grid, and no requirement for analytic Green’s 
function expansion. 


2 



I. Introduction 


The solution to Maxwell’s frequency domain equations in integral form using the electric 
field integral equations (EFIE), magnetic field integral equations (MFIE), or combined 
field integral equations (CFIE) is very well established using the Method Of Moments 
(MOM) matrix formulation approach. In the MOM, unknowns and test functions are 
often sub-sectional sampled on a sub wavelength scale resulting in dense matrices of size 
N, the number of unknown current coefficients. This fact has been the limiting feature in 
applying MOM to electrically large bodies due to the tyranny of N 2 for matrix fill time 
and storage and of N 3 time for LU factorization, that is the factorization of Z into a 
product of a Lower triangle matrix times an Upper triangle matrix. Computational time 
for LU factorization thus grows as the sixth or ninth power of body size for surface or 
volume problems respectively. Solve cost varies as N 2 time for each right hand side and 
becomes a particularly severe issue for electrically large scatterers as one needs ever- 
increasing backscatter angles to adequately sample the scattering pattern. 

Researchers have made various approaches at solving this electrically large body problem 
using so-called “fast” methods. These have been mostly based on the fact that when 
unknowns N are grouped in local spatial regions, the resulting off-diagonal blocks of the 
system impedance matrix Z can be approximated by low rank matrices. An alternate 
view is that these matrix blocks from the MOM formulation, where sampling rate 
requirements are based for near-near interactions, are extremely over-specified in terms 
of the number of matrix elements per block required for distant interactions. Matrix 
compression 1 is one way to significantly reduce the number of elements required to 
represent the physics contained in non-self block matrices. Advantage is then taken of 
this feature, in a variety of approaches, to speed the solution. A good review of existing 
“fast” methods for electromagnetic surface integral equations such as FMM, AIM, pFFT, 
MLFMM, IES3 has been presented by Zhao, Vouvakis and Lee [1]. 

References [1-5] have used various approaches to compute low rank approximates of the 
blocked system Z matrix. Each reduced matrix fill time and storage. Iterative solvers 
were then used to compute current solutions for each right hand side excitation vector. 

Iterative solvers may be quite satisfactory for only a few right hand sides such as antenna 
or bistatic scattering problems, but for monostatic scattering with many required 
sampling angles, this part of the problem becomes expensive; and there are often 
convergence issues with such solvers, particularly with geometries with many mutual 
interactions such as cavities. 

Another “fast” solution is the Simply Sparse approach [7-9], which takes a change of 
basis approach. The MOM basis and test functions are transformed using unitary 
orthogonal matrices. This results in only a small fraction of the unknowns radiating to 
the far field. Resulting blocks of the system matrix are upper comer dense. This 
property is then used to efficiently LU factor and solve the system equations, [10]. 


1 Compressible means that these matrices can be well approximated by low rank matrices. 
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The problem all of these “fast” methods have overcome, each in their own way, is that of 
reducing time and memory storage required to fill blocks of the Z matrix. Bebendorf [2] 
has recently introduced the Adaptive Cross Approximation (ACA) that attacks this 
problem in a rather interesting and purely matrix algebra approach. His ACA algorithm 
computes compressed forms of low rank block matrices using only a few rows and 
columns of that matrix. This technique has very significant ramifications: a) the matrix 
block does not require full computation or fill. Only selected rows and columns are 
needed; b) the resultant matrix can be stored in compressed format significantly reducing 
memory storage; c) use of compressed matrices in computations very significantly 
reduces operations count; and d) the operations count required by the ACA algorithm is 
small. 

A number of authors [1-5, 9] have now reported use of this ACA technique for filling 
system matrix blocks, including this author. It was during the work reported in [9], that it 
was observed that when the unknowns are spatially grouped, not only were the off- 
diagonal blocks of Z compressible, but also the off-diagonal blocks of the LU factored 
matrix were of compressible. This later observation then raised the question: Could the 
ACA be used to directly LU factor and solve the system equations without explicit matrix 
fill? That question is the topic of this report. 

This report starts by positing that when unknowns are spatially grouped, the off-diagonal 
blocks of Z and its LU factored form are able to be well approximated by low rank 
matrices expressed in outer product UV form. In this outer product UV form, we say the 
matrix block is compressed. (The diagonal blocks of the matrix are not compressible.) 
This means that such matrix blocks can be expressed as the outer product of a column 
dominant matrix U times a row dominant 2 matrix V, e.g., if A is an m x n matrix and can 
be represented by a low rank matrix, i.e., A is compressible, A can be approximated by 
the outer product of Au Av where the column dominant matrix Au is m x k and Av is the 
row dominant matrix k x n. Low rank means k « m, n thus the storage requirement for 
Au and Av is very much less than the storage requirement for A. And, the operations 
count involving A in its compressed low rank form AuAv can be very much less than 
using A in full form. 

Thus, spatially grouped off-diagonal blocks of the system impedance matrix Z can be 
expressed in the outer product form Z = Zu Zv and its LU factors can be similarly 
expressed as Lu Lv and Uu Uv. Further, for monostatic scattering, where there are many 
RHS incident plane wave-forcing functions, blocks of V can be compressed as Vv Vu 
and the final current solution blocks J are also can be compressed as Ju Jv. 

In the work that follows we will show results for the frequency domain 3D EFIE Surface 
Integral Equation MOM using RWG triangle basis functions [11, 12] with Galerkin test 
functions so that a symmetric block system results where the unknowns have been 
grouped into local spatial regions. The self and near self-terms are computed using the 


2 A column dominant matrix has fewer columns than rows; a row dominant matrix has fewer rows than 
columns. 
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radial-angular approach described in [13]. All computations were done in single 
precision. 


This report is organized as follows: 

Section 2 discusses spatial grouping of unknowns, low rank approximations, rank fraction 
and the singular value decomposition (SVD) character of compressible matrices. 

Section 3 presents rank fraction compressibility maps for various parts of the MOM 
system matrix, and the Z, LU, V, and J matrices for the SLICY and Open Pipe 
geometries. 

Section 4 briefly reviews the ACA method, describes how it can be used for LU 
factorization, for fdling the RHS monostatic voltage matrix, and for the LU forward and 
back solve. Memory and operations count efficiencies are discussed. 

Section 5 presents validation run results against another well-known code for the SLICY 
geometry. 

Section 6 presents a complexity study for the open pipe geometry in terms of matrix fill 
time, memory, LU factor and RHS solve times. 

Section 7 compares run specifics for similar SLICY and open pipe cases. 

Appendix presents the formulas for block symmetric LU factorization and solution. 


II. Unknown Grouping, Compressible Matrix Blocks and Their Low Rank 
Approximation 

MOM system equations are full rank, but when unknowns are grouped into local spatial 
regions, the off diagonal block-block or region-region sub matrices are compressible 
when the regions are spaced some distance apart, see Figure 1. Typical sub-sectional 
MOM formulations sample the unknown current density at 7 to 20 samples per 
wavelength (k) and 50- 400 samples per k 2 for wire and surface problems respectively. 
This sampling is required to adequately compute near-near interactions, but is clearly 
overkill for distant-distant interactions as evidenced by the compressible nature of these 
block matrices. 

Unknowns in this work have been grouped into local regions using a cobblestone 
distance sorting technique as follows. Create an array of all unsorted spatial locations in 
3D space. From this array of spatial points, find the minimum and maximum (x, y, z) 
points which defines vectors vMin and vMax relative to the origin. Define a reference 
direction vector vRef = vMax - vMin. Project all unsorted points onto this reference 
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direction to find the point with the smallest projected distance. Call this spatial location 
vPt Start. This point is the 1 st member of the group. Next compute the distances from 
vPt Start to all other unsorted points. Order this array by distance from close to far using 
a sorting algorithm such as quick sort. Fill region with closest unsorted points. 

Terminate when desired group size is obtained, or if the next candidate group point is 
further than a specified tolerance. Repeat procedure for next group. 

For surface geometries, this procedure produces cobblestone geometric groups, much like 
a stonemason placing bricks starting from an initial vertex and extending out to arms 
reach, see Figure 2. Each group is composed of members who are sorted in distance in a 
smooth regular fashion from near to far. The specified limit distance for a group to 
accept a new member is that the candidate point must be closer than typically three 
characteristic dimension lengths. The characteristic dimension length is defined as the 
average triangle leg length as defined in the geometry mesh. 

Once unknowns are grouped, the interaction between a pair of groups becomes a block in 
the system matrix such that block Zjj is the interaction between the i th and j th groups. 
Self-interaction blocks are on the diagonal, near interactions close to the diagonal and 
distant interactions far from the diagonal of the system matrix. 


Matrix Compression and Low Rank Outer Product Approximations: Central questions 
at this point are three: 1) How can we tell if a matrix block A is compressible? 2) What is 
a low rank approximation to A if A is compressible? And 3) what is the error in the low 
rank outer product approximation to A ~ Au Av? 

The starting point for this discussion is to review Singular Value Decomposition (SVD) 
theory from linear algebra. The results of a SVD show if a matrix is compressible and 
how a low rank outer product can approximate a compressible matrix. We note that SVD 
is a backdrop discussion, because in actual practice the Adaptive Cross Approximation 
will be used to actually compute low rank approximates, for reasons to be evident shortly. 

Any matrix can be SVD factored into the product of three matrices [14, 15]. The SVD of 
complex matrix (not necessarily a square) A e C mxn is 


A= UI V h 


111 u y 


u 


(Tj 0 


0 

0 


0 

0 


( 1 ) 


where U e C mxm and V h e C nxn , and X e R mxp is the diagonal matrix of real singular 
values a sorted from high to low. The U and V h (the superscript h indicates that this is a 
Hermitian conjugate matrix) are unitary matrices and the singular value matrix X has only 
real diagonal entries ordered from maximum to minimum 
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(Jj > (J 2 > . . . > <T p > 0 where p - min(m,n) 


( 2 ) 


The shape (number of rows and columns) of X is the same as A. The unitary matrices U 
and V h are square of size m x m and n x n respectively. The column vectors Uj are the 
columns of U and the row vectors v, are the rows of V h . The Uj’s are not only 
orthogonal, but also orthonormal to each other, similarly with the Vj’s. The columns of U 
and rows of V form a complex basis set of vectors in multi-dimensional space in the 
multi-dimensional spaces C mxl and C lxn , respectively. 

The individual diagonal elements of the singular value matrix X, Oj, are the singular 
values of matrix A. The rank of A is r, the number of ‘non-zero’ singular values. 

The SVD of A can also be written as the sum of r rank-one outer product matrices due to 
the properties of X [14, 15] 


a= Z (T / u / v / = E^ 


7=1 


7=1 


u 


[v, ] 


( 3 ) 


where Uj and Vj are the individual columns and rows of U and V h respectively and the 
sum is over the r non zero singular values. 

If the above summation is truncated to index v with v <r, then this V th partial sum 
captures as much of the energy of A as is possible for any approximation of rank v with 
“energy” defined by either the 2-norm or the Frobenius norm [15]. 

Now we have to look at this from a practical computational perspective where we have 
finite computer arithmetic and finite and/or desired precision in computations of the 
various aspects of our problem. 

The rank of A is r, the number of ‘non-zero’ singular values [15]. From a practical 
perspective, Trefethen and Bau [15] also define the ‘numerical rank’ of A as the number 
of singular values greater than some judiciously chosen value or tolerance. The chosen 
tolerance is, of course, up to the user and the problem at hand. Consequently ‘numerical 
rank’ and tolerance are related. 

Matrix A is compressible if its singular values decrease rapidly. In this case where the 
singular values drops to levels close to 32 bit machine precision, the sum of rank one 
outer products of u and v with singular value coefficients Oj can be truncated at some 
index value k < r (where r was the number of non zero singular values of A and is the 
rank of A). In this case, we can approximate A with its truncated SVD over k singular 
values rather than all r singular values: 
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A = U E V h 
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( 4 ) 


If the singular values drop rapidly, then k is much smaller than r, k « r, and the outer 
product UV approximation to A has low ra nk . 


In the SVD summation form (3) we have the truncated SVD approximation as the first k 
outer product sums 


u , 


u, 


A ~Z' T / U / V /=1>. 


[v, ]- 


j= 1 


M 


U ; 


U, 



Vi 

<y 

V; 

J 

J 


V, 


( 5 ) 


In this case, matrix A with original rank r can be approximated by an outer product of a 
column dominant matrix times a row dominant matrix. The rank of the approximation is 
k. This approximation is reasonable provided matrix A is compressible, i.e., its singular 
values rapidly decrease in value, a r « Gk- 

For a typical MOM problem, what is the behavior of the singular values of spatially 
blocked (grouped) source and test functions? A practical MOM block matrix example is 
shown in Figure 3 where the ordered singular values are plotted on a logarithmic scale. 
This matrix block is the interaction expressing the electrical influence of a source group 
of n = 214 unknowns on a distant test group of m = 220 test functions. Here we see the 
singular values decrease exponentially to near 32-bit machine precision by the 30 th 
singular value out of a possible 214 singular values. This matrix block is thus 
compressible, k « m, and can be approximated as the outer product of a m x k column 
dominant matrix times a k x n row dominant matrix where k is at most 30 and perhaps 
less based on the desired precision for the approximate outer product. Thus we 
immediately see one rationale for using low rank approximates in our computations. The 
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memory storage for column like U and row like V is potentially very much less than for 
the original block. 


What was the ‘precise’ rank of the original 220 x 214 matrix in the above example, i.e., 
the number of non-zero singular values of the SVD expansion? Certainly finite computer 
arithmetic calculations will never produce this answer. But we see that from a practical 
point of view that this matrix is rank deficient [1], that is out of a possible 220 singular 
values only about 30 are greater than levels close to 32-bit machine precision. 

Thus a matrix is compressible if its singular values drop rapidly in which case the 
truncated SVD A~USV approximation to A is of low rank. 

For a compressible matrix, the SVD computes the best low rank approximation to A [15]. 
Best meaning for a given error in either the 2-norm or Frobenius norm, the SVD produces 
the smallest value of k for the number of columns in U and rows in V for any choices of 
k orthonomal columns in U and k orthonormal rows in V. 

The SVD is, however, not the only tool available to compute a low rank outer product 
approximation to A. QR factorization is another approach and the Adaptive Cross 
Approximation (ACA) is yet another. 

The SVD is not a practical approach to compute this approximation for two important 
reasons: 1) All elements of A must be known prior to computing its SVD; and 2) The 
SVD operations count for a square matrix is of order m 3 , i.e., the SVD is an expensive 
computation. Similar issues exist with QR approaches. 

The ACA [2-5] on the other hand is a very practical approach for computing an outer 
product low rank approximation because only a few rows and columns of A are needed 
for the computation and the operations count is much smaller than required by SVD 
(discussed below). 

The next issue concerns the error in the truncated outer product (low rank) approximation 
for A denoted by A c (k) where 


Au 


A c (k) - Au Av = 


[Av 


] = A 


( 6 ) 


where column like Au is m x k and row like Av is k x n and their product matrix Ac is 
the low rank outer product approximation to A. 

The fractional error tolerance 8 of this approximation is measured in terms of Frobenius 
matrix norm, namely, 
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< £ 


( 7 ) 


A ~ A c (k) 

||a|| 

where the numerator is the norm of the approximation error. 

If matrix Ac = Au Av represents A to this tolerance, we say that A can be approximated 
by a matrix of ra nk k for tolerance e. In this sense, ra nk of the approximate matrix Ac 
and tolerance are directly related. We also note that for a given tolerance 8, the rank of 
the outer product approximate using the SVD will be less than with using the ACA. That 
is, the SVD computation is the ‘best’ Frobenious low-rank approximate to A for a given e 

[15]. 

If the ordered singular values decrease rapidly, then A can be well approximated by an 
outer product approximation of low rank; and, therefore, the matrix A is compressible. 
And lastly, the approximation error is measured using matrix norms. 


Rank Fraction 3 : When evaluating memory storage of compressible block matrices using 
the ACA UV form, one needs a relative evaluation of the reduced memory storage for the 
Zu Zv approximation relative to the memory required to store Z. The ratio of the storage 
for Zu and Zv compared to the storage for Z is a normalized measure of the 
compressibility of matrix Z. In this report, this ratio is called rank fraction and is defined 
as the memory storage for Zu and Zv, k(m+n), compared to the m*n storage for Z. This 
ratio is the rank fraction RF, 


RF = Hm±n)_ 

mn 


( 8 ) 


If k « m,n, that is, Z is of low rank, then RF is much less than unity. If m = n, and Z is 
full rank, that is k = m, then RF = 2, and the UV representation takes twice the storage of 
Z. 

In this work we will express rank fraction on a lOlogio scale in decibels (dBrf) (since 
single block rank fractions have approached 0.01), that is 


dBrf = 101og 10 


r k(m + nf 
y mn j 


(9) 


Thus a rank fraction of 0.01, 99% compressed, has a rank fraction on this scale of -20 
dBrf. We will have occasion to see such values for individual block matrices. 


3 The term rank fraction and matrix block compression will be used interchangeably in this report. 
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Before we discuss the ACA efficient computation of compressed low rank matrices, let 
us first examine the rank fraction for various components of the MOM system matrix that 
show they are indeed compressible. 


III. System Block Equation Compressibility Maps 

In this report we show results for the frequency domain electric field integral equation 
(EFIE) for PEC surfaces. Rao-Wilton Glisson (RWG) triangle basis functions [11, 12] 
with Galerkin test functions are used so that a symmetric block system results where the 
unknowns have been grouped into local spatial regions. The self and near self-terms are 
computed using the radial-angular approach described in [13]. 

Matrix element integration uses three points each in source and test triangles. For self 
and near self, each triangle is split into three parts and a fourth-order Gaussian quadrature 
is used for the radial integration [13]. Self-term symmetry is forced by swapping source 
and test triangles and using the average. 

In the symmetric matrix results to follow, both lower and upper triangles will be shown 
for ease of presentation and understanding. Of course, they are just the transpose of each 
other. 

Off-diagonal blocks of the impedance Z matrix have been known to be compressible with 
low rank approximates [1-6, 9, and 18]. How about the other system matrices, such as 
those comprising the LU factorization of Z, the RHS voltage vector / matrix when V is 
due to many incident plane waves, and J, the current solution? 

Z J = LU J = V. (10) 

Let us consider two electrically large geometries, SLICY and Open Pipe, Figures 4 and 5, 
where the cobblestone region grouping has been applied. We will examine the rank 
fraction of the system LU blocks. 

Rank and tolerance are related. In the rank fraction maps shown in this report, the 
tolerance is relative to the norm of the largest block in a block row, i.e., the diagonal 
block. 


SLICY: This target is composed of two vertical circular interacting cylinders, one of 
which is a cavity, and of a corner reflector interacting with the cylinders. These multiple 
bounce interactions are principally specular in nature. The number of unknowns is 
90,71 1, maximum group size is 800, and average triangle edge length is 0.1 17 A,. The 
ACA approximation tolerance for the LU factorization matrix blocks was 10" 4 while the 
ACA tolerance for the solve V and J blocks was 10 3 . 
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The block system Z rank fraction is shown in Figure 6 on a dBrf scale. Clearly the ra nk 
fraction of the interaction matrices decrease as the interaction distance increases. The 
sparseness 4 of Z is 95%. 

The block system LU factored matrix rank fraction is shown in Figure 7, also on a dBrf 
scale. The LU off-diagonal blocks are low rank, but not as low as the original Z blocks 
due to factorization fill in. The average sparseness of the entire matrix is now 90%. 

The important observation is that for this highly interacting target, the LU factored form 
is still 90% sparse (as comprised of individual low rank blocks). 

What about the RHS V matrix and the current matrix solution J? Are these 
compressible? If the problem is antenna or bistatic scattering with only a few RHS’s, 
then the answer is no. How about monostatic scattering where there are many RHS 
incident plane waves? Shown in Figure 8 is the rank fraction map for 250 angles, two 
polarizations each, so that the number of RHS = 500 and V e C 90 ' 71 lx50 °. 5 The rank 
fraction of the blocks comprising V, shown in Figure 8 for a tolerance of 8 = 10' 4 , is 
approximately -8 dBrf which represents a sparseness of 86%. Clearly V is of 
compressible. 

What about the current solution matrix J e C 90 ' 71 1x500 for this monostatic case? The 
block rank fraction for J is shown in Figure 8. It too is compressible with an average 
block rank fraction of approximately -3 dBrf that represents a sparseness of 55% 
compared to 86% for V. Because of fill in, current solution blocks are typically not as 
sparse as RHS forcing matrices V. 


OPEN PIPE: This geometry has many scattering features: internal wave guide 
propagation and/or cavity multi-bounce propagation, external traveling waves, leading 
and trailing edge rim diffraction, creeping waves, and specular scattering mechanisms 
depending on plane wave excitation angle. The open pipe represents a robust target for 
computing sensitive scattering mechanisms. This pipe has a length of 36” and the inn er 
and outer diameters are 3.87” and 4.0”. In the rank fraction maps to be shown, the 
excitation frequency is 6 GHz, the total number of unknowns is 92,220, the maximum 
group size is 800 unknowns per group and the LU AC A tolerance is e = 10~ 4 . 

The open pipe block system Z rank fraction is shown in Figure 9 on a dBrf scale. Clearly 
the rank fraction of the interaction matrices decrease as the interaction distance increases. 
Overall sparseness is 95%. 


4 The sparseness of the compressed Z matrix is defined as the total memory storage, including the non- 
compressed diagonal blocks, relative to the storage required for uncompressed Z matrix. 

5 The number of RHS per solve is sometimes less than the total number of RHS for a given problem. The 
number of RHS per solve depends on available computer memory. Thus for many RHS and limited 
memory, a number of solves are required. Sparseness for V and J here is stated for the number of RHS 
solved per group. 
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The block system LU factored matrix rank fraction is shown in Figure 10, also on a dBrf 
scale. The LU blocks are low rank, but not as low as the original Z blocks due to fill in. 
Overall sparseness drops to 92%. 

The monostatic RHS V g C 92 ' 220x50 ° is compressible, see Figure 11. Overall V sparseness 

OO OOA rAA 

is 96%. The corresponding current solution Jg C ’ x , is also compressible 85% 
sparse, but not as low as V due to fill in. 

These results for two distinctly different interacting target types clearly show that the 
system equation block matrices for Z, L, U, V and J are compressible. The next question 
is, can we take advantage of this observation for reducing memory and speeding up the 
LU factor and solve? 


IV. Adaptive Cross Approximation 

It should be clear by now that for electrically large problems, the various block 
components of the system matrix ZJ = LUJ = Vare compressible when unknowns are 
spatially grouped. The next question is how do we take computational advantage of this? 
Clearly performing a singular value decomposition (SVD) on each block would be a 
possibility, but that has a complexity of m 3 per block, not very efficient even though that 
was the approach in [19] for compressing the Z matrix before the advent of ACA. One 
could also compute low rank blocks using the Modified Graham Schmidt (MGS) 
approach as was done in [18]. But that also is of significant complexity and is no longer 
suggested by the authors [1]. 

What is needed is an approach to compute low rank block approximates that do not 
require first filling either the Z or LU blocks and has a low operations count. See 
Appendix for detailed explanation of the symmetric LU block factorization. 

The Adaptive Cross Approximation (ACA) algorithm as recently developed by 
Bebendorf [2] and further discussed by [ 1 , 3-6, 9] is just such an approach. 

The reader can consult the references just cited for the ACA details, but briefly, the ACA 
method computes the compressed approximate form of matrix A ~ Au Av using only k 
rows and k columns of A to obtain Au and Av. The ACA operations count [2] is of order 
0(k 2 (m+n)) where Ag C mxn , Aug C mxk , Avg C kxn . If A is low rank, k«m,n, then the 
ACA algorithm has a very small operations count, and more importantly, requires only a 
small fraction of A for input. 

The ACA starts its 0 th approximation by choosing a row and column of A to form the first 
outer product uo Vo approximation. These become the first row and column members of 
Au and Av. Adding more columns to Au and rows to Av makes successive 
improvements to reduce the approximation error. These new rows and columns are based 
on a maximum element pivoting strategy for choosing the next row and column [1,2]. 

This has the very practical effect of speeding the convergence. Successive improvements 
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converge when the norm of the k th Uk Vk outer product’s next computed term is less than 
the tolerance e times the norm of A (or in the MOM equation case, less than the norm of 
the largest row block norm). The norm of A, when required, is computed using a 
recursive approach during the ACA loop [2], Note that A does not have to exist in full 
form for the ACA operation provided one has a subroutine to compute the few needed 
rows and columns of A. 

An ACA example of compression for the C x low ra nk MOM matrix of Figure 3 is 
shown in Figure 12 where the ACA was used to compute the compressed representation. 
Just 20 terms are needed for a relative tolerance of 10' 5 . Also shown are the stepwise 
errors relative to the ACA recursive norm [3] and the actual norm. The singular values 
are also plotted. 


Operations Count Efficiencies 

Rank fraction is a direct indicator of memory storage for low rank matrices, and is one of 
the key benefits of working with compressed representations. Operations (ops) count 
reduction is the other key benefit. 

When A is represented as a UV outer product with k « m,n, we say matrix A is 
compressed. In the notation that follows, we will write compressed matrices 
as A ~ Au Av . 


Matrix multiplication of compressed matrices is very efficient if done in the proper order. 
In the following example, we will multiply compressed matrix A = Au Av times a 
matrix B to form matrix C = A B. We will assume that C is used in its compressed form 
Cu Cv. 


Case One : Matrix B is not compressed. We compare the operations count for the 
product C = AB versus the approximation Cu Cv = Au Av B. For A e C mxn and B e 
C nxp , the product operations count to compute C is (mnp). 


The compressed multiply has the form 


C = A B 
C - Cu Cv 
A B ~ Au Av B 



Au 


f 


\ 


Cu" 

Au Av B = 



[Av ] 

B 


= 





V 


) 




[Ci 


( 11 ) 

] 


In this case, Cu = Au and the ops count for Cv = Av B is (knp). 
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The ops efficiency for computing compressed C versus full C = AB is 1 st order small in 
(k/m), 


knp 

mnp 


- — « 1 if k « m . 

m 


( 12 ) 


Case Two : Matrix B is compressed and available, B = Bu Bv where Bu e C nxr and Bv 
e C ,rxp with r « n,p. The operations count for compressed C follows from 


C = AB 

C « Cu Cv 

AB = Au Av Bu Bv 


Au Av Bu Bv = 


Au 

f 

Bu 



Au 


[Av ] 



[Bv ] = 



V 


j 




[T ][Bv ] 


(13) 


Cu 


[Cv 


] 


The intermediate matrix T = Av Bu is of an inner product form with a reduced ops count 
of (knr). 

The ops count for multiplying the small matrix T with Au is (mkr) or with Bv is (rkn). 

The ops cost to compute Cu Cv is { knr + mkr } or { knr + rkn } either of which is much 
less than computing full C with ops of (mnp). 

The ops efficiency for computing compressed versus full C is 2 nd order small 


knr + mkr kr(m + n) , 

— -«1 ifk,r«m,n,p. 

mnp mnp 


(14) 


If A and B are square with m, n, and p equal, the ops count efficiency is 2 nd order small 
(2kr compressed operations « m 2 uncompressed operations), 


knr + mkr kr2m 2 kr . . „ , 

= — r— = — - « 1 if k,r « m . 

mnp m m 


( 15 ) 
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The key observation is that compressed matrix multiplication is very operations count 
efficient as well as memory efficient. 

When working with compressed matrices in a multiplication sequence, one always tries 
to minimize the operations count by choosing the multiply order that is of the inner 
product versus outer product form since the inner product form has significantly fewer 
operations. 


Operations Count Metric 

Just as low rank matrices save significant memory storage, they also save significant 
number of operations count in the compressed block LU factorization. A block wise 
measure of reduced ops count is the ratio of compressed to non-compressed LU 
factorization operations, measured on a dB scale (10 logio ), 


dB _ ops _ reduction = 10 log 10 


/ AC A LU Factor ops 
v Normal LU ops y 


(16) 


where the numerator is the LU block operations count using compressed matrices to 
compute the compressed block [Uu Uv]i ; j , 


[UuUv] - 



(17) 


and the denominator is the block operations count required to compute the un- 
compressed LU factor Uy using non-compressed blocks, 

(-1 

U =Z -Yu r D " 1 U . (18) 

V V /—! V PP PJ ' ’ 

p = 1 


An example of the ops count reduction is shown in Figure 13 for the 90,71 1 unknowns 
SLICY problem where reductions of up to -25 dB are seen. 


V. Validation Results for SLICY 

The computational times reported here were obtained using a PC Workstation with two 
Xeon core 2 duo processors (4 cores total), 16 GB of memory, and four SCSI 15k rpm 
disk drives set up in RAID 0 (disk stripe) mode. The operating system was Windows XP 
64. 
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SLICY [20] is a target that has multiple interactions between the ground plane, two 
vertical cylinders (one of which is a cavity), a dihedral, and two trihedrals, Figure 4. The 
Mercury MOM model description and run parameters are shown in Table 1. 


Table 1 SLICY Validation Run Parameters 


Geometry 

SLICY 

Number of Unknowns 

90,711 

Frequency 

514.3 MHz 

Average Edge Length (X) 

0.117 

Couples per X 2 

259 

Maximum region size 

800 

Number of RHS 

722 

ACA factor tolerance 

10' 4 

ACA solve tolerance 

lO' 4 

Z sparseness 

95% 

LU factor sparseness 

90% 

Voltage sparseness 

86% 

Current solution sparseness 

55% 

Total wall run time 
(PC Workstation) 

0.78 hr 


The full polarization scattering matrix for backscatter was computed for an azimuth cut 
with 361 angles (722 RHS’s) at an elevation angle of 15° look-down. The reference full 
matrix computation 6 was computed with the method of moments code named CARLOS 
4.4 using the iterative out-of-core solver option with a block LU (BLU) pre-conditioner, 
one polarization per run. Run time was approximately 48 hours on a Silicon Graphic 
Incorporated (SGI) R12000 processor. SLICY measured data were compared to 
CARLOS predictions in [21]. 

The polarization scattering matrix results for HH, VV, and cross polarization VH, HY of 
the compressed ACA MOM solution virtually lay on top of the full matrix solution of 
CARLOS, Figures 14-16. 

This agreement clearly shows that a very significant fraction of the MOM system 
equation matrix was low rank and most of the matrix elements do not contain useful 
problem information and can be expressed in compressed form. Spatial grouping, low 
rank, and ACA LU factor / solve can effectively be used to eliminate a large fraction of 
the problem which does not contain useful physics. This results in much less memory, 
much faster run times, and the ability to use inexpensive computer platforms. 


6 Thanks to Steve Carter from the National Ground Intelligence Center for providing the CARLOS 
electromagnetic predictions of the SLICY model. 
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VI. Complexity Results for Open Pipe 

Complexity metrics using only the number of problem u nkn owns N have been the 
traditional approach for measuring the efficacy of “fast” methods and that is the approach 
that will be taken in this work. 

However, it is worth noting that complexity is a poor measure since the following 
parameters significantly influence the efficacy of a given “fast” method: 

Surface area measured in ?i : Complexity involves the number of unknowns N in a given 
problem as a measure of problem size. A much more robust approach would be to use 
the surface area measured in X 2 (for surface integral equations). This metric would allow 
taking into account benefits due to higher order basis/test functions, better integration 
techniques and most importantly, reduced sampling in terms of unknowns / X 2 . 

Geometry mutual coupling: The degree of mutual coupling in a given geometry can 
influence complexity. This would influence the time need for iterative solvers to 
converge, while in direct methods, it would influence the amount of compression for the 
LU factor and solve matrices. 

Computer hardware cost : Can the algorithm be used on an inexpensive PC, a 
workstation, a cluster of networked computers, or a super computer? Computer costs 
involve the amount of memory and the number of cpu’s and hard drives. 

Wall clock time: This is not cpu time, but rather the time between when a user submits a 
job and the time results are obtained. Reduced wall clock time is associated with 
increased computer resource expense. Reduced wall clock time allows more design 
space iterations and is critical for applications using computational methods. 

Computation time can also be influenced by matrix algorithm efficiencies. Algorithms, 
which make use of highly optimized matrix libraries such as Level 3 BLAS and Lapack, 
can have significant impact on reducing time even though operations count remains 
constant. 

Accuracy: Accuracy affects run times. “Fast” methods always involve tolerances. In 
our case, we have an LU factor ACA tolerance and a RHS ACA tolerance for filling the 
V matrix, and for the forward and back solves. Tight tolerances will lead to greater run 
times. 

“Fast” methods need to be compared against a metric that takes into account these issues; 
however, the following complexity metrics use only problem size as measured by N, the 
number of unknowns. 

Open Pipe Results: The open pipe geometry [22], outlined previously, was run for 
fourteen frequencies, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 18 and 20 GHz computing 
backscatter RCS over 0 to 180 degrees. Unknown sample edge lengths averaged 0.1 A 
(380-400 unknowns/^ 2 ). The number of right hand sides ranged from 3 10 to 6162. The 
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ACA convergence tolerance 7 was 10' 5 of the norm of the diagonal block (in the block 
row). The ACA tolerance for solve, for computing V, for LU forward and back solve to 
obtain J, was 10' 4 . 


Table 2 Open Pipe Complexity Run Cases 


Geometry 

Open Pipe 

Number of Unknowns 

2,592- 1,025,109 

Frequencies 

1 to 20 GHz 

Average Edge Length (X) 

0.098 -0.1 

Couples per X 2 

410-383 

Maximum region size 

2000 - 3000 

Number of RHS 

310-6142 

ACA factor tolerance 

10' 5 

ACA solve tolerance 

lO' 4 

Z sparseness 

63% - 99% 

LU factor sparseness 

62% - 97% 

Total Wall Time 

17 sec - 121 hr 


Several of the very largest problems had to store the LU factorization to disk due to the 
limitation of installed memory. This has the effect of longer LU factor times due to the 
slower disk 10 access. 

The following comparisons are relative to the standard non-sparse complexity 
methodology for a LU MOM solution. The system matrix memory required for ACA LU 
factorization scales as N ' , Figure 17, compared to N for the standard factorization. Fill 
time for the Z matrix, as required for ACA LU factorization input, Figure 18, scales as 

13 2 

N ' , also compared to N for the standard fill time. 

ACA LU factorization time scales as N 2 ' ’, Figure 19, compared to N 3 . Solve time per 
RHS vector, Figure 20, scales as N compared to N . 

The overall problem wall clock time scales as N 19 , Figure 21. This is the time for the 
complete problem. 

The 157,059 unknowns’ problem, at a frequency of 8 GHz, with 2466 RHS was 
completed on the PC Workstation, in core, with a wall time of 1.28 hours. 


VII. SLICY and Open Pipe Run Comparisons 

A comparison of these two targets, each with approximately the same number of 
unknowns is instructive, Table 3. Each has distinctly different scattering mechanisms. 


7 The Open Pipe has more non-specular low-level scattering mechanisms than SLICY, which has mostly 
high-level specular mechanisms. Thus the Open Pipe requires tighter ACA factor and solve tolerances. 
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Each has different average edge lengths and ACA tolerances for LU factor. Each has 
different surface areas measured in square wavelength. 


Yet, when matrix storage, matrix fill and LU factor and total run time are compared 
normalized to wavelength, each of these parameters are roughly equivalent. 


Table 3 Comparison of similar Open Pipe and SLICY runs 


Geometry 

Open Pipe 

SLICY 

Surface Area 

231 k 2 

350 k} 

Number of Unknowns 

92,220 

90,711 

Average Edge Length (k) 

0.1 

0.117 

Couples per k 2 

400 

259 

Maximum region size 

800 

800 

Number of RHS 

1850 

722 

ACA factor tolerance 

10' 5 

io - 4 

ACA solve tolerance 

10“ 4 

10“ 4 

Z sparseness 

95% 

95% 

LU factor sparseness 

92% 

90% 

Voltage sparseness 

96% 

86% 

Current solution sparseness 

85% 

55% 

LU memory (GB) 

2.8 

3.3 

LU memory per a 

0.012 

0.0093 

Matrix fill & factor (hr) 

0.446 

0.70 

Matrix fill & factor per A? 

1.9 10 s 

2.0 10 3 

Total run time (hr) 

0.512 

0.78 

Total run time per 

2.2 10 s 

2.2 10r 3 
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Summary 


It is clear that electrically large problems with unknowns grouped in local spatial regions 
have a system matrix Z and its lower/upper (LU) factored forms that are comprised of 
compressible off-diagonal blocks. For monostatic scattering, where the right hand side 
(RHS) voltage matrix is composed of many incident plane waves, blocks of V are also 
compressible as well as the current solution J. 

Compressible matrices and their low rank approximations fundamentally means that most 
of the Method of Moments (MOM) system matrix equation elements, before 
compression, contain very little physical information. The Adaptive Cross 
Approximation (ACA) can be used to extract and compress the system equation 
elements, keeping only the necessary physics. The ACA can be used for all steps of the 
solution: filling the Z matrix; LU Factoring; and solving. This results, depending on 
problem size, in orders of magnitude less memory and run times. One can use 
inexpensive computer resources for problems that previously took super computers. Or 
solve problems that could not be solved at all. 

It is also important to point out that this ACA LU factorization and solution is completely 
within the context of standard Method of Moments. There are no requirements for new 
basis or test functions, no requirements for far or near field distinction, no requirements 
for auxiliary “equivalent” sources on a grid, and no requirement for analytic Green’s 
function expansion. 

This direct LU factor and solve technique is validated against another well-known code 
for a 90,7 1 1 unknown SLICY geometry. Complexity of this approach is studied by 
numerical experiment for specific error tolerances for an open pipe scattering target with 
number of unknowns ranging from 2,592 to 1,025,109 and the number of RHS varying 
from 310 to 6,162. Matrix fill time scales as N 1 ' 3 , LU matrix memory storage scales as 
N ' , LU factor time scales as N ' , and the time per RHS solve scales as N ' . Complete 
solution clock time scales as N 1 9 . 
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Appendix 


Complex Symmetric LU Block Factorization and Solve 

The formulas used for block symmetric LU factor and for forward / back solve are 
obtained by comparing the symmetric factor form to the standard non-symmetric scalar 
factor forms. 

Block forms of scalar LU expressions are straightforward. One simply uses block 
matrices in place of scalar entities. The only difference is for division, which in block 
form becomes a matrix inverse, i.e., 1/d => D' 1 . And the order of operations must follow 
matrix rules for multiply. 

Block factorization [14-17] of a complex symmetric matrix is 


z = ldl t = u t du 


(A-l) 


where T represents transpose. L is lower triangular, U is upper triangular and D are the 
diagonal entities (scalar or blocks). Either L or U is required when Z is symmetric. 

Our goal is to obtain the formulas required to compute U and to obtain the solution 
formulas. We will do this by comparing the above expression with that for non- 
symmetric scalar matrices. 

Scalar Non-Symmetric LU Formulas 

The standard LU non-symmetric factorization for a scalar matrix is 


~Zn 

7 

^12 

7 

^13 


"l 

0 

o’ 


u n 

U 12 

U 13 

^21 

7 

^22 

7 

^23 
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hi 

i 

0 


0 

1^22 

U 23 

_^31 

7 

^32 

7 

^33 J 


J31 

^32 

1 


0 

0 

U33 


(A-2) 


The column wise factorization formulas for u and 1 are [15], 


1 

U ij ~ Z ij ~ 1 ip u pi (A-3) 

p = 1 
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( 



\ 

Z ij~YjiV U Vj 

p=l J 


(A-4) 


where ujj are the diagonal entries. 

The solve process has forward and back steps. Given Z J = L U J = V, the solution for J 
starts with L X = V where X = U J. This is the forward solve process that uses the lower 
triangle matrix L starting with the 1 st row, i =1, 


i-i 


x, 


p = i 


i P x P 


(A-5) 


The back solve process for J uses the upper triangle matrix U, starting with the last row, 
i=n: 


Ji ~ ' 


u„ 




'i 


U 1 

ip J p 

p=i+l j 


(A-6) 


The next step is to compare these standard expressions with those for symmetric block 
matrices. 

LU Factorization of a Complex Symmetric Block Matrix 

This work uses the upper U form for symmetric factorization, and for purposes of 

comparison, the symmetric block will be identified as U' as the block contents will not be 
the same entities as the u’s of non-symmetric scalar expressions, 


Z = U’ t DU\ 


(A-7) 


Expanding to full form 
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(A-8) 
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The diagonal entries of U' are unity. Putting U ,T DU' into LU form, we first expand DU' 
to obtain the form 


X 

Z 12 

Zu" 


’ 1 

0 

o’ 


Dn 

DnU' 12 

DnU' 13 

Z i2 

^22 

Z 23 

= 

U' T 
1 12 

1 

0 


0 

D 2 2 

®22^ 23 

Z 13 

7 t 

^23 

Z 33_ 


U' T 

^ 13 

U' T 

L 23 

1 


0 

0 

D 33 


Making the substitutions 

U = D U' 

•j ii ij 

U = D 
11 11 

L.. = U' T = [d^u.T = u^d: 1 

ij ji L ii ij J jin 

we arrive at the standard LU form 


_Z 11 

Z 12 

z 13 ' 
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Un 
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U 13 ' 

Z U 
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with factorization 

U.. = Z..-Vu r D -1 U . 

IJ V t—l l P PP PJ 


and solve 


(A-9) 


(A- 10) 


(A-l 1) 


(A- 12) 
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; Forward 


i-i 


x =v-Yu r d‘x 

i i v pp i 

p = i 


(A- 13) 


j,=d? 


V P=i - 1 


; Back 


which completes the block symmetric matrix LU factor and solve expressions. 

AC A LU Block Factorization 

ACA factorization computes blocks of U in low rank compressed form 

U = Uu Uv (A- 14) 

which are transpose stored for efficient organization, 


U r = ( [Uu] [Uv] ) r = [Uv] r [Uu] r (A- 1 5) 


This is a straightforward application of the ACA algorithm to the LU block factorization 
computational expression, 


U„=[UuUv], .= Z 9 -£[uv T Uu T ] t ,D^[UuUv] J , J (A- 16) 

p=\ 


The ACA algorithm requires only selected rows and columns of Uy to obtain the 
compressed form Uu Uv. Only a few selected rows and columns of Zy must be 
computed. 

It is important to note that the blocks of the original Z matrix are never filled. Only the 
Zy rows and columns, needed by the ACA algorithm to compute the Uy block 
factorization, are filled. This further means that this procedure for ACA factorization 
does not have a direct analogy to the direct matrix fill concept. 

Forward and back solve forms are: 
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X,=X,-X(UuUv); D'J, (XuXv) f ; Forward 


•I, = U„' X, - X(UuUv) lt (JuJv)„ ;Back 
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Figure 1 Spatially grouped unknowns lead to low rank block matrices Z, L, U, V, and J for 
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unknowns, 51 groups 


Figure 3 Singular values for a 220 by 214 Z interaction block matrix. Singular values drop to 
machine precision by k = 30. 


Figure 4 SLICY test geometry showing unknowns grouped into local spatial regions 


Figure 5 Open Pipe Geometry showing unknowns grouped into local spatial regions 


Figure 6 SLICY Z matrix rank fraction in dBrf, 90,711 unknowns, maximum group size 800, overall 
sparseness is 95 % 


Figure 7 SLICY LU U block rank fraction in dBrf, 90,711 unknowns, , maximum group size 800 
overall sparseness is 90 % 


Figure 8 SLICY RHS voltage matrix V(black) and current solution J(blue) rank fraction, 20 dB scale 
(0.01 < RF < 1.0), by region. 90,711 unknowns, 722 RHS monostatic incident plane waves. V is 86% 
sparse while J is 55% sparse. 


Figure 9 Open Pipe: Z matrix rank fraction, 20 dB scale (0.01 < RF < 1.0), 92,220 unknowns, 
maximum group size 800, overall sparseness = 95 % 


Figure 10 Open Pipe: factored matrix rank fraction, 20 dB scale (0.01 < RF < 1.0), 92,220 unknowns, 
maximum group size 800, overall sparseness = 92 % 


Figure 11 Open Pipe: V and J rank fraction (dBrf), 6 GHz, 92,220 Unknowns, V is 96% sparse while 
J is 85% sparse. 


Figure 12 ACA norm approximation errors for 220 x 214 Z block of Figure 3 as a function of k UV 
rows and columns. Convergence to 10“ 4 by k = 20. 


Figure 13: SLICY LU operations count, 30 dB scale (0.001 < RF < 1.0), 90,711 unknowns, maximum 
group size 800, tol = 10-4 

Figure 14 SLICY VV polarization, MERCURY MOM and CARLOS predictions overlay 
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Figure 15 SLICY HH polarization, MERCURY MOM and CARLOS prediction overlay 

Figure 16 SLICY HV and VH cross polarization, MERCURY MOM and CARLOS predictions 
overlay 

Figure 17 Compressed matrix storage complexity 
Figure 18 AC A Matrix fill time complexity 
Figure 19 ACA LU factor time complexity 
Figure 20 ACA Solve time per RHS complexity 
Figure 21 Total run wall time complexity 
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Figure 1. Spatially grouped unknowns lead to low rank block matrices Z, L, U, V, and J for 
electrically large bodies (from [1]). 



Figure 2. Cobblestone regions on a flat plate geometry: 24933 unknowns, maximum region size 600 
unknowns, 51 groups. 
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Singular Values for a Numerically Low Rank Interaction Matrix 



Figure 3. Singular values for a 220 by 214 Z interaction block matrix. Singular values drop to 
machine precision by k = 30. 
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Figure 4. SLICY test geometry showing unknowns grouped into local spatial regions. 



Figure 5. Open pipe geometry showing unknowns grouped into local spatial regions. 
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SLICY: Z Matrix Rank Fraction, 20 dB Scale 
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Figure 6. SLICY Z matrix rank fraction, 20 dB scale (0.01 < RF < 1.0), 90,711 unknowns, maximum 
group size 800, overall sparseness = 95 %. 


34 


SLICY: Factored Matrix Rank Fraction, 20 dB scale 



OJCiOe+000 


-2 JOe+001 


50064001 


75064001 


1006400 


-1256400 


-1506400 


-1.756400 


-2006400 




Figure 7. SLICY factored matrix rank fraction, 20 dB scale (0.01 < RF < 1.0), 90,711 unknowns, 
maximum group size 800, overall sparseness = 90 %. 
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Figure 8. SLICY RHS voltage matrix V(black) and current solution J(blue) rank fraction, 20 dB 
scale (0.01 < RF < 1.0), by region. 90,711 unknowns, 722 RHS monostatic incident plane waves. 
Overall V is 86% sparse and J is 55% sparse. 
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Open Pipe: Z Matrix Rank Fraction, 20 dB Scale 
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Figure 9. Open Pipe: Z matrix rank fraction, 20 dB scale (0.01 < RF < 1.0), 92,220 unknowns, 
maximum group size 800, overall sparseness = 95 %. 
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Open Pipe: Factored Matrix Rank Fraction, 20 dB Scale 
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Figure 10. Open Pipe: factored matrix rank fraction, 20 dB scale (0.01 < RF < 1.0), 92,220 unknowns, 
maximum group size 800, overall sparseness = 92 %. 
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Figure 11. Open Pipe: RHS voltage matrix V(black) and current solution J(blue) rank fraction, 20 
dB scale (0.01 < RF < 1.0), 6 GHz, 92,220 Unknowns, V is 96% sparse, J is 85% sparse. 
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Figure 12. ACA norm approximation errors for 220 x 214 Z block of Figure 3 as a function of k UV 
rows and columns. Convergence to 10" 4 by k = 20. 
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dB ops count reduction 
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Figure 13. SLICY LU operations count, 30 dB scale (0.001 < RF < 1.0), 90,711 unknowns, maximum 
group size 800, tol = 10-4. 
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30 


SLICY: W Polarization 



SLICY: HH Polarization 
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SLICY: HV and VH Cross Polarization 



Figure 16. SLICY HV and VH cross polarization, MERCURY MOM and CARLOS predictions 
overlay. 
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Size of LU Memory (GB) 



Figure 17. Compressed matrix storage complexity. 



Figure 18. AC A Matrix fill time complexity. 


LU Factor (Wall Time, hr) 



Figure 19. ACA LU factor time complexity. 
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Figure 20. ACA solve time per RHS complexity. 



Figure 21. Total run wall time complexity. 
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